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Fast, Accurate Second Order Methods for Network Optimization 


Rasul Tutunov, Haitham Bou Ammar, and Ali Jadbabaie 


Abstract — Dual descent methods are commonly used to solve 
network flow optimization problems, since their implementation 
can be distributed over the network. These algorithms, however, 
often exhibit slow convergence rates. Approximate Newton 
methods which compute descent directions locally have been 
proposed as alternatives to accelerate the convergence rates of 
conventional dual descent. The effectiveness of these methods, is 
limited by the accuracy of such approximations. In this paper, 
we propose an efficient and accurate distributed second order 
method for network flow problems. The proposed approach 
utilizes the sparsity pattern of the dual Hessian to approximate 
the the Newton direction using a novel distributed solver for 
symmetric diagonally dominant linear equations. Our solver is 
based on a distributed implementation of a recent parallel solver 
of Spielman and Peng (2014). We analyze the properties of 
the proposed algorithm and show that, similar to conventional 
Newton methods, superlinear convergence within a neighbor¬ 
hood of the optimal value is attained. We finally demonstrate 
the effectiveness of the approach in a set of experiments on 
randomly generated networks. 

I. INTRODUCTION 

Conventional methods for distributed network optimiza¬ 
tion are based on sub-gradient descent in either the primal 
or dual domains, see [8], [9], [10], [13]. For a large class 
of problems, these techniques yield iterations that can be 
implemented in a distributed fashion by only using local 
information. Their applicability, however, is limited by in¬ 
creasingly slow convergence rates. Second order Newton 
methods [3], [4] are known to overcome this limitation 
leading to improved convergence rates. 

Unfortunately, computing exact Newton directions based 
only on local information is challenging. Specifically, to 
determine the Newton direction, the inverse of the dual Hes¬ 
sian is needed. Determining this inverse, however, requires 
global information. Consequently, authors in [5], [6] pro¬ 
posed approximate algorithms for determining these Newton 
iterates in a distributed fashion. Accelerated Dual Descent 
(ADD) [6], for instance, exploits the fact that the dual Hes¬ 
sian is the weighted Laplacian of the network and performs 
a truncated Neumann expansion of the inverse to determine 
a local approximate to the exact direction. ADD allows for a 
tradeoff between accurate Hessian approximations and com¬ 
munication costs through the N-Hop design, where increased 
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N allows for more accurate inverse approximations arriving 
at increased cost, and lower values of N reduce accuracy 
but improve computational times. Though successful, the 
effectiveness of these approaches highly depend on the 
accuracy of the truncated Hessian inverse which is used to 
approximate the Newton direction. As shown in Section [VI] 
the approximated iterate can resemble high variation to the 
real Newton direction, decreasing the applicability of these 
techniques. 

Exploiting the sparsity pattern of the dual Hessian, in 
this paper we tackle the above problem and propose a 
Newton method for network optimization that is both faster 
and more accurate. Using recently-developed solvers for 
symmetric diagonally dominant (SDDM) linear equations, 
we approximate the Newton direction up-to any arbitrary 
precision e > 0. The solver is a distributed implementation 
of [11] constructing what is known as an inverse chain. We 
analyze the properties of the proposed algorithm and show 
that, similar to conventional Newton methods, superlinear 
convergence within a neighborhood of the optimal value 
is attained. We finally demonstrate the effectiveness of the 
approach in a set of experiments on randomly generated 
networks. Namely, we show that our method is capable of 
significantly outperforming state-of-the-art methods in both 
the convergence speeds and in the accuracy of approximating 
the Newton direction. 

The remainder of the paper is organized as follows. 
Section [TT] draws upon background material needed for the 
remainder of the paper. Section uni defines the network 
flow optimization problem targeted in this paper. Section [IV] 
details our proposed distributed solver for SDDM linear sys¬ 
tems. Section [V] introduces the approximate Newton method 
and rigorously analyzes its theoretical guarantees. Section [VTl 
presents the experimental results. Finally, Section IVIII con¬ 
cludes pointing-out interesting directions for future research. 

II. BACKGROUND 
A. SDDM Linear Systems 

To determine the Newton direction, we need to solve a 
symmetric diagonally dominant system of linear equations, 
defined as: 

M 0 x = b 0 (1) 

where Mq is a Symmetric Diagonally Dominant M-Matrix 
(SDDM). Namely, Mq is symmetric positive definite with 
non-positive off diagonal elements, such that for all i = 
1,2 

n 

[M 0 ] u >- Y, [ M °b 


The system of Equations in Q] can be interpreted as repre¬ 
senting an undirected weighted graph, Q, with Md being its 
Laplacian. Namely, Q = (A f,£,W), with JV representing 
the set of nodes, £ denoting the edges, and W representing 
the weighted graph adjacency. Nodes v t and v 3 are connected 
with an edge e = (i,j) iff XV l3 > 0, where: 

Wij = [M 0 ] u (if i = j), or = - [Mo]^- , otherwise. 

Following [11], we seek e-approximate solutions to x*, being 
the exact solution of MqX = 6 (J , defined as: 

Definition 1: Let x * £ R n be the solution of Mx = b 0 . 

A vector x £ R" is called an e— approximate solution, if: 

II®* - ®Hmo < e||®*IUo > where IMllio = uTm ou. 

( 2 ) 

The R-hop neighbourhood of node Vk is defined as 
N r (rtfc) = {v £ Af : dist(i>fe,r;) < r }. We also make 
use of the diameter of a graph, Q , defined as diam ( Q ) = 
i nax„ ; u, ea/* dist ( v^ , Vj ). 

Definition 2: A matrix A £ R nxra is said to have a 
sparsity pattern corresponding to the R-hop neighborhood 
if Ay = 0 for all i = 1 and for all j such that 

Vj N r (tti). 

We will denote the spectral radius of a matrix A by 
p (A) = max A,|, where A* represents an eigenvalue of the 
matrix A. Furthermore, we will make use of the condition 
numbefl k (A) of a matrix A defined as n = 

In [?] it is shown that the condition number of the graph 
Laplacian is at most O In 3 ], where W max and XV m i„ 
represent the largest and the smallest edge weights in Q. Fi¬ 
nally, the condition number of a sub-matrix of the Laplacian 
is at most O (ji 4 ^, see [11]. 

B. Standard Splittings & Approximations 

For determining the Newton direction, we propose a fast 
distributed solver for symmetric diagonally dominant linear 
equations. Our approach is based on a distributed imple¬ 
mentation of the parallel solver of Spielman and Peng [11]. 
Before detailing the parallel solver, however, we next provide 
basic notions and notations required. 

Definition 3: The standard splitting of a symmetric matrix 
Mq is: 

Mq = D 0 - A 0 . (3) 

Here, D o is a diagonal matrix such that [Do\ it = [Mo]- 
for i = 1,2and Ao representing a non-negative 
symmetric matrix such that [Aq]^ = — [Mq]^ if * ^ j, 
and [Ao] w = 0. 

We also define the Loewner ordering: 

Definition 4: Let Spp be the space of n x n-symmetric 
matrices. The Loewner ordering A is a partial order on 
<S(„) such that Y A X if and only if X — Y is positive 
semidefinite. 

Finally, we define the “~ a ” operation used in the sequel 
to come as: 

1 Please note that in the case of the graph Laplacian, the condition number 
is defined as the ratio of the largest to the smallest nonzero eigenvalues. 


WA) 

Ami n (A) 


Definition 5: Let X and Y be positive semidefinite sym¬ 
metric matrices. Then X ss a Y if and only iff 

e~ a X A Y A e“X (4) 


with A A B meaning B — A is positive semidefinite. 

Based on the above definitions, the following lemma 
represents the basic characteristics of the operator: 

Lemma 1: [11] Let X ,Y , Z and, Q be symmetric pos¬ 

itive semi definite matrices. Then 

(1) If X « a Y, then X + Z Y + Z, (2) If 

X Y and Z Q, then X + Z « Q Y + Q 

(3) If X » Q Y and Z Q, then X + Z « a Y + Q, 

(4) If X « ai Y and Y m a2 Z , then X « ai+Q2 Z 

(5) If X, and Y are non singular and X ss a Y, then 
X- 1 « Q Y~ x , (6) If X « a Y and V is a matrix, 
then V T XV V T YV 

The next lemma shows that good approximations of Mf 1 
guarantee good approximated solutions of MqX = bo. 

Lemma 2: Let Zq Mf 1 , and x = Zifti,. Then x is 
\/2 f (e e — 1) approximate solution of Mqx = bo. 

Proof: The proof can be found in the appendix. ■ 


C. The Parallel SDDM Solver 


The parallel SDDM solver proposed in [11] is a paral¬ 
lelized technique for solving the problem of Section III-AI It 
makes use of inverse approximated chains (see Definition [6]) 
to determine x and can be split in two steps. In the first 
step, denoted as Algorithm [U a “crude” approximation, Xq, 
of x is returned. Xo is driven to the e-close solution, x, 
using Richardson Preconditioning in Algorithm [2] Before 
we proceed, we start with the following two Lemmas which 
enable the definition of inverse chain approximation. 

Lemma 3: [11] If M = D~ A is an SDDM matrix, with 
D being positive diagonal, and A denoting a non-negative 
symmetric matrix, then D — AD 1 A is also SDDM. 

Lemma 4: [11] Let M = D — A be an SDDM matrix, 
where D is positive diagonal and, A a symmetric matrix. 
Then 


(■ d-a r 1 


D 1 + (T + D~ 4 A) (D — AD~ l A) 1 

(5) 


(I + AD” 1 ) . 

Given the results in Lemmas [3] and 0] we now can consider 
inverse approximated chains of Mo: 

Definition 6: Let C = {Mo, Mi,..., MJ be a collec¬ 
tion of SDDM matrices such that M, = D, — Ai, with D, 
a positive diagonal matrix, and A, denoting a non-negative 
symmetric matrix. Then C is an inverse approximated chain if 
there exists positive real numbers eo, ei,..., e<i such that: (1) 
For i = 1,..., d: Di — A,; « ei _i Di -1 ~ Aj-iDjl^Aj-i, 
(2) D t A-i. and (3) D d « £d D d - A d . 

The quality of the “crude” solution returned by Algo¬ 
rithm [I] is quantified in the following lemma: 

Lemma 5: [11] Let {Mo, Mi,..., Mi} be the inverse 

approximated chain and denote Zo be the operator defined by 










Algorithm 1 ParallelRSolve (Mo, Mi ,..., M d , bo) 

1: Input: Inverse approximated chain, 

{M 0 , Mi,..., M d }, and b 0 being 
2: Output: The “crude” approximation, Xq, of x* 

3: for i = 1 to d do 

4 : bi = (I + i 

5: end for 

6 : X d = D^bd 

7: for i = d — 1 to 0 do 

8: Xi = \ [£>r 1 6 i + (I + D~ 1 A i ) x i+ i] 

9: end for 
10 : return x 0 


ParallelRSolve (Mo, Mi,..., M dl bo), namely, Xo = Zobo- 
Then 

Zo ^ f Mo 1 ( 6 ) 

Algorithm Q] returns a “crude” solution to M 0 x = b. To 
obtain arbitrary close solutions, Spielman et. al [11] intro¬ 
duced the preconditioned Richardson iterative scheme, sum¬ 
marized in Algorithm [2] Following their analysis. Lemma [6] 
provides the iteration count needed by Algorithm [2] to arrive 
at x. 


Algorithm 2 ParallelESolve (Mo, Mi,..., M d , bo, e) 

1: Input: Inverse approximated chain {Mo, Mi,..., M d }, 
bo, and e. 

2: Output: e close approximation, x, of x* 

3: Initialize: y 0 = 0; 

X = ParallelRSolve (Mo, Mi,.. ., M d , bo) (i.e.. Algo¬ 
rithm [T]) 

4: for k = 1 to q do 
5 : = MoUk-l 

6: Uf, 2 ' 1 = ParallelRSolve ^M 0 , Mi,..., M d , 

( 2 ) 

1-- Vk = Vk-1 ~ u k + x 

8: end for 
9: x = y q 

10: return x 


Lemma 6: [11] Let {Mo, M\ ... M d ] be an inverse 

approximated chain such that { In 2. Then 

ParallelESolve (M 0 , Mi ,..., M d , b 0 , e) arrives at an e close 
solution of x* in q = O (log i) iterations. 

III. NETWORK FLOW OPTIMIZATION 

We consider a network represented by a directed graph 
Q = (N,£) with node set M = {1 and edge 

set £ = {1,...,S}. The flow vector is denoted by x = 
[a: e ] e6 £> w i’ ; h x ^ representing the flow on edge e. The 
flow conservation conditions at nodes can be compactly 
represented as 

Ax = b, 

where A is the TV x E node-edge incidence matrix of Q 
defined as 


1 if edge j leaves node i 
— 1 if edge j enters node i 
0 otherwise. 


and the vector b £ 1^- denotes the external source, i.e., 
&W > 0 (or < 0) indicates units of external flow 
enters (or leaves) node i. A cost function <i> e : R —> M 
is associated with each edge e. Namely, ^> e (a: ( - e - ) ) denotes 
the cost on edge e as a function of the edge flow x^ e \ 
We assume that the cost functions <t> e are strictly convex 
and twice differentiable. Consequently, the minimum cost 
networks optimization problem can be written as 


E 

miny^ $ e (x( e ^) (7) 

e=l 

s.t. Ax = b 


Our goal is to investigate Newton type methods for solving 
the problem in [7] in a distributed fashion. Before diving 
into these details, however, we next present basic ingredients 
needed for the remainder of the paper. 

A. Dual Subgradient Method 

The dual subgradient method optimizes the problem in 
Equation [7] by descending in the dual domain. The La- 
grangian, l : x R w —> R is given by 

E 

l(X, A) = — $e(x {e) ) + \ t (Ax - b). 

e=l 

The dual function q( A) is then derived as 
q( A) = inf l(x. A) 

cc6R e 

= inf ( — $ e (x^ e - > ) + A T Aa; ] A T b 

.SR® y ^ J 

= Y inf f-# e (x (e) ) + (A T A) (e) x (e)N ) -A T b. 

e=l 

Hence, it can be clearly seen that the evaluation of the dual 
function q( A) decomposes into E one-dimensional optimiza¬ 
tion problems. We assume that each of these optimization 
problems have an optimal solution, which is unique by the 
strict convexity of the functions Denoting the solutions 
by x <e '> (A) and using the first order optimality conditions, it 
can be seen that for each edge, e, x^ (A) is given 

x^(A) = [6 e ]- 1 (A«-A«), (8) 

where i £ Af and j £ Af denote the source and destining 
nodes of edge e = (i,j), respectively (see [6] for details). 
Therefore, for an edge e, the evaluation of j;7') (A) can be 
performed based on local information about the edge’s cost 
function and the dual variables of the incident nodes, i and 

j- 


’Note that if the dual is not continuously differentiable, the a generalized 
Hessian can be used. 











The dual problem is defined as max AeR jv q{ A). Since the 
dual function is convex, the optimization problem can be 
solved using gradient descent according to 

\ k +i = A k - a k g k for all k > 0, (9) 

with k being the iteration index, and g k = g (X k ) = Vq(A k ) 
denoting the gradient of the dual function evaluated at A = 
A/, . Importantly, the computation of the gradient can be 
performed as g k = Ax (X k ) — b, with x(\ k ) being a vector 
composed of (X k ) as determined by Equation[8] Further, 
due to the sparsity pattern of the incidence matrix A, the i th 
element, , of the gradient g k can be computed as 

9k ] = E * W (**)- E x [e \X k )-b^. (10) 

Clearly, the algorithm in Equation [9] can be implemented 
in a distributed fashion, where each node, i, maintains 
information about its dual, and primal, x^ ( X k ), iterates 
of the outgoing edges e = Gradient components can 

then be evaluated as per [10] using only local information. 
Dual variables can then be updated using [9] Given the 
updated dual variables, the primal variables can be computed 
using [8] 

Although the distributed implementation avoids the cost 
and fragility of collecting all information at centralized lo¬ 
cation, practical applicability of gradient descent is hindered 
by slow convergence rates. This motivates the consideration 
of Newton methods discussed next. 

B. Newton’s Method for Dual Descent 

Newton’s method is a descent algorithm along a scaled 
version of the gradient. Its iterates are typically given by 

X k+1 = X k + a k d k for all k > 0, (11) 

with d k being the Newton direction at iteration k, and a k 
denoting the step size. The Newton direction satisfies 

H k d k g kl (12) 

with H k = H{X k ) = X7 2 q(X k ) being the Hessian of the 
dual function at the current iteration k. 

1) Properties of the Dual and Assumptions: Here, we 
detail some assumptions needed by our approach. We also 
derive essential Lemmas quantifying properties of the dual 
Hessian. 

Assumption 1: The graph, Q, is connected, non-bipartite 
and has algebraic connectivity lower bound by a constant uj. 

Assumption 2: The cost functions, ,,(■), in Equation [7] 
are 

1) twice continuously differentiable satisfying 

7 < ^e(-) < r, 

with 7 and T are constants; and 

2) Lipschitz Hessian invertible for all edges e £ £ 

1 1 


The following two lemmas [5], [6] quantify essential 
properties of the dual Hessian which we exploit through our 
algorithm to determine the approximate Newton direction. 

Lemma 7: The dual objective q( A) = A T (Aa;(A) — b) — 
$ e (at(A)) abides by the following two properties [?]: 

1) The dual Hessian, H(X), is a weighted Laplacian of 

G: 

H(X) = v 2 q( A) = A [V 2 /(x(A))] ~ 1 A J . 

2) The dual Hessian H( A) is Lispshitz continuous with 
respect to the Laplacian norm (i.e., || • Hr) where £ is 
the unweighted laplacian satisfying £ = AA J with A 
being the incidence matrix of Q. Namely, VA, A: 

||Tf(A)-ff(A)|| £ <B||A-A|| £ , 

with B = where p n (£) and /WjC) denote 

7 s/^2 (C) 

the largest and second smallest eigenvalues of the 
Laplacian £. 

Proof: See Appendix. ■ 

The following lemma follows from the above and is needed 
in the analysis later: 

Lemma 8: If the dual Hessian H (A) is Lipschitz continu¬ 
ous with respect to the Laplacian norm ||- ||r (i.e., Lemma[7]), 
then for any A and A we have 

IIVg(A) - Vg(A) - ff(A)(A - A)|U < |||A - X\\ 2 C . 

Proof: See Appendix. ■ 

As detailed in [6], the exact computation of the inverse 
of the Hessian needed for determining the Newton direction 
can not be attained exactly in a distributed fashion. Authors 
in [5], [6] proposed approximation techniques for computing 
this direction. The effectiveness of these algorithms, how¬ 
ever, highly depend on the accuracy of such an approxima¬ 
tion. In this work, we propose a distributed approximator for 
the Newton direction capable of acquiring e-close solutions 
for any arbitrary e. Our results show that this new algorithm 
is capable of significantly surpassing others in literature 
where its performance accurately traces that of the standard 
centralized Newton approach. Next, we detail our distributed 
SDD solver being at the core of our approximator. 

IV. SDD DISTRIBUTED SOLVERS 

We propose a distributed solver for SDDM systems which 
can be used to determine an approximation to the Newton 
direction up to any arbitrary e > 0 (see Section 0. Our 
method is based on a distributed implementation of the 
parallel solver of Section 1II-C1 Similar to [11], we first 
introduce an approximate inverse chain which can be com¬ 
puted in a distributed fashion. This leads us to a distributed 
version of the “crude” solver (i.e.. Algorithm III-Cl l. Contrary 
to [11], however, we then generalize the “crude” distributed 
solver to acquire exact solutions to an SDDM system. For a 
generic SDDM system of linear equations, our main results 
for determining an e-close solution (i.e., \\x — x* \\m, s < 
e|| £C *||Ar 0 ) ' s summarized b>0: 


&e{x) &e(x) 


< <5 |rc — jk| . 


’The complete proofs can be found at https://db.tt/MbBW15Zx 












Lemma 9: For the system of equations represented by 
MqX = b, there is a distributed algorithm that uses only 
R-Hop information and computes the e-close solution, x, 
in T(n, e) = O + /3d max R^j log Q)) time steps, 

with k(Mq) being the condition number of Mo, f3 = 
min < n, " lax _ 1 > representing the upper bound on the size 
of the R-Hop neighborhood, d max the maximal degree of Q, 
and e € (0, \] being the precision parameter. 

Analogous to [11], we will develop and analyze two 
distributed solvers for SDDM systems (i.e., “crude” R-Hop 
solver and “exact” R-Hop solver) leading to the proof of the 
above lemma. 

A. “Crude” R-Hop SDDM Solver 

Algorithm[3]presents the “crude” R-Hop solver for SDDM 

systems. Each node receives the k th row of Mo , k th 
component, [£>o]fc of bo, the length of the inverse chain, d, 
and the local communication bouncfli? as inputs, and outputs 
the k th component of the “rude” approximation of x*. 

Analysis of Algorithm [3] The following Lemma shows 
that RDistRSolve computes the k th component of the 
“crude” approximation of x* and provides the algorithm’s 
time complexity 

Lemma 10: Let Mo = Dq — Aq be the standard splitting 
and let Z' 0 be the operator defined by RDistRSolve, namely, 
Xq = Z^bo. Then, Z' {) Mf 1 . RDistRSolve requires 

O + /3Rd max y where /3 = min jn, }’ to 

arrive at Xq. 

Proof: See Appendix. ■ 

B. “Exact” Distributed R-Hop SDDM Solver 

Next, we provide the exact R-Hop solver. Similar to 
RDistRSolve, each node Vk receives the k th row Mo, [t>o]fc, 
d, R, and a precision parameter e as inputs, and outputs the 
k th component of the e close approximation of vector x*. 

Analysis of Algorithm [6} The following Lemma shows 
that EDistRSolve computes the k th component of the e 
close approximation to x * and provides the time complexity 
analysis. 

Lemma 11: Let Mq = Dq — Aq be the standard split¬ 
ting. Lurther, let ed < 1 /‘i In 2. Then Algorithm [6] requires 
O (log i) iterations to return the k th component of the e 
close approximation to x*. 

Proof: See Appendix. ■ 

Next, the following Lemma provides the time complexity 
analysis of EDistRSolve. 

Lemma 12: Let M 0 = D 0 — A 0 be the standard split¬ 
ting and let e d < 1 /3ln2, then EDistRSolve requires 

O ((2 d /R/3 + /3Rd ma x) log (Ye)) time steps. Moreover, for 
each node vp : , EDistRSolve only uses information from the 
R-hop neighbors. 

Proof: See Appendix. ■ 

The complexity of the proposed algorithms depend on the 
length of the inverse approximated chain, d. Here, we provide 

4 For simplicity, R is assumed to be in the order of powers of 2, i.e., 
R = 2P. 


Algorithm 3 RDistRSolve ({[M 0 ] fc i,..., [M 0 ] kn }, [b Q \ k , d, R ) 

Part One: 


{{A 0 DQ 1 ) kl ,...,[A 0 DQ 1 ] kn } = ■■■>$&}’ 

{[Do^olfci, ■ • •, [D^A 0 } kn } = {|^i 


. } 

[Co]fci> • • • j [Co]fcn = Comp 0 ([M 0 ]fci,..., [M 0 ]fe n , f?), 

[<?!]«,...,[<?!] 

kn — Comp 1 ([M 0 ] fe i,...,[M 0 ] km R) 


kn 


Part Two: 
for i = 1 to d do 

if i — 1 < p 


Ini J k — [A 0 D 0 bp— 1]^ 


4 i_1)i 

for j = 2 to 2 I_1 do 

1 


= [AqD, 


o 


end for 

[bi]k = [bi~i]k + 
if i — 1 > p 
k -1 = 2i_1 /R 

[it^ _1) ] fc = [Co6i_i]fe 

for j = 2 to Zj_ i do 
[Wj* _1) ]fc = [CoufT* 

end for 


o -1 J fe 
(i-l)l 


[bi]k = [bi-i]k + [u h _ 1 ]k 

end for 


fc 


Part Three: 

[Xd]k = ^ k /[Do\kk 
for i = d — 1 to 1 do 
if i < p 

[Vi * +1) ]fc = l D o 1 A 0 x i+ i] k 

for j = 2 to 2 l do 

[^ +1) ]fc = [D- 1 Aqt,^] 


end for 


[ Do]kk 


+ [sEi+l]fc + [rfp }k 


if i > p 

h = 2 V fl 

[Vi +1) ]k = [CiX i+1 \k 
for j = 2 to k do 


(*+i)i 


h 

end for 


= 




[Xi\k = 

end for 

[xq ]fc = | 
return [aj 0 }, 


[Do]kk 


[ri^lk 


[Do\kk 

k 


+ [ x i+l]k 

[xi]k “I - [D 0 .Ao^i] 


an analysis to determine the value of d which guarantees 
e d < | In 2 in C = {A 0 , D 0 , A x , D lt ..., A d , D d }. These 
results are summarized the following lemma 

Lemma 13: Let Mq = Dq — Aq be the 
standard splitting and let n denote the condition 
number of Mq. Consider the inverse approximated 
chain C = {A 0 , Do, Ai, Z?i,..., A d , D d } with 

length d = [log ^2In «)l, then D 0 

Dq - Do (Df 1 A 0 )~ , with e d < t/3ln2. 

























Algorithm 4 Comp 0 ([M 0 ] kl ,..., [M 0 ] kn , R) _ 

for l = 1 to R — 1 do 

for j s.t.Vj £ Nj+i(vfe) do 

[(A„D^y+'] kj 

E l ^j-[(A 0 D^) l ] kr [A Q D^} jr 

ri) r 6 Ni(»j) 

end for 
end for 

return c 0 = , [(AqDq 1 ) 11 ]^} 


Algorithm 5 Comp^fMoJfci,..., [M 0 \ kn , R) 


determine d k by solving H k d k = -g k using Algorithm^ It 
is easy to see that our approximation of the Newton direction, 
dk, satisfies 

\\dk — dk\\H k <. e||dfc||fr fc 
with d k = -Z k g k: 

where Z k approximates H\. according to the routine of Al¬ 
gorithm [ 6 ] The accuracy of this approximation is quantified 
in the following Lemma 

Lemma 14: Let H k = H{\ k ) be the Hessian of the dual 
function, then for any arbitrary e > 0 we have 


for l = 1 to R . — 1 do 

for j s.t.Vj £ Ni + i(ufe) do 
[{Do 1 A 0 ) l+1 ] k . 

E [ ^- r [(D^A,y\ kr [D^A,]jr 

THVrGNl ( Vj) 

end for 
end for 

return a = {[(D^Aq) 11 }^, ..., [(D^ 1 A 0 ) R ] kn } 


Proof: See Appendix. ■ 

Combining the above results finalizes the proof of 
Lemma[9] The usage of this distributed solver to approximate 
the Newton direction, as detailed in the next section, en¬ 
ables fast and accurate distributed Newton methods capable 
of approximating centralized Newton directions up to any 
arbitrary e. 

V. FAST & ACCURATE DISTRIBUTED NEWTON 
METHOD 

Our approach only requires R-Hop communication for the 
distributed approximation of the Newton direction. Given 
the results of Lemma [7] we can determine the approximate 
Newton direction by solving a system of linear equations 
represented by an SDD matrix^ according to Section HY1 
with Mo = H k = ff(A fc ). 

Formally, we consider the following iteration scheme: 

E-+i = Afc + a k d k , (13) 

with k representing the iteration number, a k the step-size, 
and d k denoting the approximate Newton direction. We 

5 Due to space constraints, we refrain some of the proofs to the appendix. 


Algorithm 6 EDistRSolve ({[M Q ] fci ,..., [M 0 \ kn }, [ b 0 ] k ,d , R , e 
Initialize: [y 0 \ k = 0, and [x}k = 

RDistRSolve({ [Mf\ k \ ,..., [ M 0 \ kn }, [& 0 ]fc, d , R) 

for t = 1 to q do 

[' u t ]fc = [D 0 \kk[yt-i\k — lyt-tlj 

[u[ 2) ] k = RDistRSolve({[M 0 ]fei,..., [ M 0 \ kn }, [u ( t 1) ] k , d, R) 


e~ e v T H\v < v T Z k v < e e v T H\v, \/v £ 1^. 

Proof: See Appendix. ■ 

Given such an accurate approximation, next we analyze 
the iteration scheme of our proposed method showing that 
similar to standard Newton methods, we achieve superlinear 
convergence within a neighborhood of the optimal value. We 
start by analyzing the change in the Laplacian norm of the 
gradient between two successive iterations 

Lemma 15: Consider the following iteration scheme 
Afc + i = \ k + a k d k with a k £ ( 0 , 1 ], then, for any arbitrary 
e > 0, the Laplacian norm of the gradient, | \g k +i \\c, follows: 


\\9k+i\\c < 


1 — Ckfc + Ctfc£ 


M 2 (£) ' 


\\9k\\x 


(14) 


alBT\l + ef 2 

4 - ,2rr\ - \\9k\\ci 


2gl(C) 


with fi n (£) and g 2 {£) being the largest and second smallest 
eigenvalues of C, T and 7 denoting the upper and lower 
bounds on the dual’s Hessian, and B £ R is defined in 
Lemma [ 8 ] 

Proof: See Appendix. ■ 

At this stage, we are ready to present the main results quan¬ 
tifying the convergence phases exhibited by our approach: 


Theorem 1: Let 7, T, B be the constants defined in 
Assumption [2] and Lemma [7] g n (C) and 72 (L) representing 
the largest and second smallest eigenvalues of the normalized 
laplacian C, e £ ^ 0 , ^ P rec l s i° n parameter for 

the SDDM (Section iLVb solver, and letting the optimal step- 

size parameter a* = jj+ep (r vHjt \ ) ■ T ^ 611 the proposed 

algorithm given by the A/c+i = A/, + a*d k exhibits the 
following three phases of convergence: 

1) Strict Decreases Phase: While ||g fc || £ > Vi- 


?(Afc + i) - g(Afc) < 


1 e ~ 2e2 7 3 Pl(£) 2 
2 (l + e) 2 r 2 ^(C ) 11 1 


2) Quadratic Decrease Phase: While g 0 < \ \gk\\cV'\ '■ 


\\g k+ i\\c < —\\gk\\c- 

m 


[yt]k = [yt~i]k - [u[ 2) ] k + [x]fc 

end forend for 
return [x] k = [y q ] k 


3) Terminal Phase: When ||g fc ||£ < r/ (l : 
Ilfffc+ilU < a l-a*+a*e 


\ 


lln(C) /r 


M 2 (£) V 7 


\\9k\\c, 


























where rjo = ^ and 771 = A-*£, with 

4 = 


c = 


a 1 — a* + a*e 

B(a*T{l+e)) 2 
2 //!(£) 


/r 

M2(£) V 7 


(15) 


Proof: We will proof the above theorem by handling 
each of the cases separately. We start by considering the case 
when ||< 7 fc||£ > rji (i.e.. Strict Decrease Phase). We have: 

q(^k+ 1 ) = <z(A k) + g k (•Afc+i — A*) 

+ — Afc) T J?(z)( — A*) 


a 


— ?(Afc) + a k g k d k + —d k H(z)dk 
2 

—1— Qji - -1 ^ 

< g(A fc ) + a k g k d k + -±d k Cd k , 

27 

where the last steps holds since if(-) 

- 7 C. Noticing 

that \\d k \\ 2 c < HS£+*j-\\g k \\l (see Appendix), the only 

remaining step needed is to evaluate gjd k . Knowing that 
d k = ~Z k g k , we recognize 

gjdk = gj, Z k g k < er f2 gj. H 'lg k (Lemma E]) 


< — 


zQkdk < -- 


< - 


^ g n (C) 

e _e2 7 g T k £gk _ e _e2 7 


g k gk 


gn{C) g n {C) g 2 (C) 


\\gk\\c, 


where the last step follows from the fact that \/v e 


v' v > 


v Cv 
- M»( C) 


. Therefore, we can write 


q{X k+1 ) - q(\ k ) < - 


OL k 


7 2 r 2 (l + e) 2 


nl(£) 


— a 


2-fg 2 (£) 


WgkWl 


It is easy to see that a k = a* = 
minimizes the right-hand-side of the above equation. Using 
llff/cllz; gives the constant decrement in the dual function 
between two successive iterations as 

^ ^ i e- 2e2 7 3 gt(£) 2 

q ( k +i) q(\ k ) - 2 (l + e) 2 r 2 ^(re¬ 
considering the case when 770 < llflfclli 7 ?! (i.e., Quadratic 
Decrease Phase), Equation fl4l can be rewritten as 

llflfc+ilk < ^ 2 ll0fclk + C\\gk\\ 2 c, 

with 4 and 4 defined as in Equation[l5] Further, noticing that 
since \\g k \\ c > go then \\g k \\c < ^\\9k\\c = 

Consequently the quadratic decrease phase is finalized by 


\\9k+l\\c < C 


1-4 


+ 1 ) \\dk\\ 2 c — 


— — \\gk\\c- 

Vi 


Finally, we handle the case where < Vo (he.. 

Terminal Phase). Since ||< 7 /-||£ < 77o||flfe||£, it is easy to 
see that 


iiflfc+iiu < (4 2 + cvo)\\g k \\c = (e +^(i - m\9k\\c 

4llfl l fc|k = 4 1 — a* + a*e 


\ 


Pn(£) 


P2(C) V 7 


I \9k\\c- 


Having proved the three convergence phases of our algo¬ 
rithm, we next analyze the number of iterations needed by 
each phase. These results are summarized in the following 
lemma: 

Lemma 16: Consider the algorithm given by the following 
iteration protocol: A / r+1 = \ k+ i+a*d k . Let Ao be the initial 
value of the dual variable, and q* be the optimal value of 
the dual function. Then, the number of iterations needed by 
each of the three phases satisfy: 

1) The strict decrease phase requires the following 
number iterations to achieve the quadratic phase: 

r I--2 


TVi < c 1 


PniC? 

t4(£) 


1 - e 


g n {£) 
P2{£) V 7 


where C x = C x (e, 7 , T, <5, q(A 0 ), q*) = 2<5 2 (1 + 
e) 2 [q(X 0 )-q*]^. 

2) The quadratic decrease phase requires the following 
number of iterations to terminate: 



where r = ■A-||fl | fc'||£, with k! being the first iteration 
of the quadratic decrease phase. 

3) The radius of the terminal phase is characterized by: 


/Terminal 7 


1 _ r Mm(£) 

M2 (C) 


■g n {£.)^/g 2 (£). 


g-e 2 7 5 

Proof: See Appendix. 

Given the above result, the total 
sage complexity can then be derived 
O ((JVi + N 2 ) np ( K (Hfc)^ + f?d max ) log (A)). 


mes¬ 

as 


VI. EXPERIMENTS AND RESULTS 

We evaluated our approach on two randomly generated 
networks. The first consisted of 30 nodes and 70 edges, 
while the second contained 90 nodes with 200 edges. The 
edges were chosen uniformly at random. The flow vectors, 
b, were chosen to place source and sink nodes diam(C) away 
from each other. An e of |0 | )||0 , a gradient threshold of 
10~ 10 , and an R-Hop of 1 were provided to our SDDM 
solver for determining the approximate Newton direction. 
We compared the performance of our algorithm, referred to 
SDDM-ADD hereafter, to ADD, standard gradient descent, 
and the exact Newton method (i.e., centralized Newton 
iterations). The values of the primal objective and feasibility 
were chosen as performance metric. 
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Fig. 1. Performance metrics on two randomly generated networks, showing the primal objective, / (*&), and feasibility ||Aajfc — 6|| as a function of the 
number of iterations k. On a relatively small network (i.e., 30 nodes and 70 edges) we outperform ADD and gradient descent by approximately an order 
of magnitude. On larger networks (i.e., 90 nodes and 200 edges), SDDM-ADD is superior to both ADD and gradient descent, where the primal objective 
of the latter two algorithms converges to 10 5 after 3000 iterations. It is also worth noting that we perform closely to the exact Newton method computed 
according to a centralized approach. 


Figure Q] shows these convergence metrics comparing 
SDDM-ADD, to ADD [6], standard gradient descent, and 
the exact Newton method (i.e., centralized Newton iteration). 
On relatively small networks, 30 nodes and 70 edges, our 
approach converges approximately an order of magnitude 
faster compared to both ADD and gradient descent as 
demonstrated in Figures |l(a)| and |l(b)[ It is also clear that 
on such networks, SDDM-ADD is capable of closely tracing 
the exact Newton method where convergence to the optimal 
primal objective is achieved after « 200 iterations compared 
to ss 500 for ADD and « 2000 for gradient descent. 

In the second set of experiments that goal was to eval¬ 
uate the performance of SDDM-ADD on large networks 
where both ADD and gradient descent underperform. Results 
reported in Figures |l(c)| and |l(d)| on the larger 90 nodes 
and 200 edges network clearly demonstrate the effective¬ 
ness of our approach. Benefiting from the approximation 
accuracy of the Newton direction, SDDM-ADD is capable 
of significantly outperforming state-of-the-art methods. As 
shown in Figure [T(dj| convergence to the optimal solution (as 
computed by exact Newton iterations) is achieved after 3000 
iterations, while ADD and gradient descent underperform by 
converging to a primal value of 10 5 . 

VII. CONCLUSIONS 

In this paper we proposed a fast and accurate distributed 
Newton method for network flow optimization problems. Our 
approach utilizes the sparsity pattern of the dual Hessian 
to approximate the Newton direction using only local infor¬ 
mation. We achieve e-close approximations by proposing a 
novel distributed solver for symmetric diagonally dominant 
systems of linear equations involving M-matrices. Our solver 
provides a distributed implementation of the algorithm of 
Spielam and Peng by considering an approximate inverse 
chain that can be computed in a distributed fashion. 

The proposed approximate Newton method utilizes the 
distributed solver to obtain e-close approximations to the 
exact Newton direction up-to any arbitrary e > 0. We 
further analyzed the properties of the resulting approximate 
algorithm showing that, similar to conventional Newton 
methods, superlinear convergence within a neighborhood of 


the optimal value can be attained. Finally, we demonstrated 
the effectiveness of our method in a set of experiments on 
randomly generated networks. Results showed that on both 
small and large networks, our algorithm, outperforms state- 
of-the-art techniques in a variety of convergence metrics. 

Possible extensions include applications to network utility 
maximization [7], general wireless communication optimiza¬ 
tion problems [14], and stochastic settings [15]. 
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Appendix 

The complete proofs can be found at: 
https://db.tt/MbBW15Zx 

















































